Keir Fraser [Fri, 5 Jun 2009 08:26:39 +0000 (09:26 +0100)]
VT-d: remove useless variables
This patch removes global variable "vtd_enabled", which is
redundant. "iommu_enabled" is enough. And also removes useless global
variables qi_ctrl and ir_ctrl, which are not used at all.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Fri, 5 Jun 2009 08:25:50 +0000 (09:25 +0100)]
Intel VT-d: fix Stoakley boot issue with iommu=1
Signed-off-by: Weidong Han <Weidong.han@intel.com>
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Keir Fraser [Thu, 4 Jun 2009 21:26:38 +0000 (22:26 +0100)]
docs: Note that changelog is not up to date for Xen 3.4+
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 4 Jun 2009 21:25:10 +0000 (22:25 +0100)]
x86: hap dirty vram tracking
Currently HAP systems suffer a significant performance loss when a vnc
client is connect or the sdl interface is used, because HAP is lacking
an implementation of track_dirty_vram.
As a consequence qemu always tries to update the whole screen because
it does not know which areas of the screen have been updated by the
guest.
This patch implements track_dirty_vram for HAP enabling the logdirty
mechanism only in a specific gfn range and adding a
paging_log_dirty_range function that returns the log dirty bitmap in a
requested range.
Paging_log_dirty_range is different from paging_log_dirty_op because
operates on a range and also because it does not pause the domain. In
order not to lose any update I moved clean_dirty_bitmap at the
beginning of the function before evaluating the logdirty bitmap.
The bitmap is still safe because it is protected by the logdirty lock.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Thu, 4 Jun 2009 09:57:39 +0000 (10:57 +0100)]
xm: Remove redundant os.waitpid() call from do_console()
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 4 Jun 2009 09:48:45 +0000 (10:48 +0100)]
vtd: ia64 fix of intremap.c
19707:
07cf79dfb59c caused compilation error on ia64.
This patch fixes it.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Keir Fraser [Thu, 4 Jun 2009 09:47:56 +0000 (10:47 +0100)]
xend: pci: only extract the exact pci BDFs
On some hosts:
[root@localhost ~]# ls /sys/bus/pci/devices/0000:00:05.0/
0000:00:05.0:pcie00 0000:05:00.0 class driver local_cpus
resource subsystem_vendor
0000:00:05.0:pcie01 broken_parity_status config enable modalias
subsystem uevent
0000:00:05.0:pcie02 bus device irq power
subsystem_device vendor
Here we should only get 0000:05:00.0, but we also get 0000:00:05.0
unexpectedly. With this patch, xend only extracts the exact BDF(s).
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Thu, 4 Jun 2009 09:46:13 +0000 (10:46 +0100)]
xm: Don't die when trying to conect the console to short-lived domains
As observed by Mick Joran, if short-lived domain exits cleanly
then os.waitpid() will throw the following exception. This appears
to be because the child process that is used to start the domain
has detached from its parent.
OSError: [Errno 10] No child processes
Cc: Mick Jordan <Mick.Jordan@sun.com>
Signed-off-by: Simon Horman <horms@verge.ent.au>
Keir Fraser [Thu, 4 Jun 2009 09:45:24 +0000 (10:45 +0100)]
blktap2: fix parallel Make.
sub make in tools/blktap2/daemon/lib and tools/lvd/lib
can be triggered many times at the same time which results in
weired link error because one target is linking a library while
another target is trying to recreate the library.
This patch makes it invoke submake only once.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Keir Fraser [Thu, 4 Jun 2009 09:43:44 +0000 (10:43 +0100)]
xm: pass-through: sort the output of xm pci-list
Other than being arguably more human readable,
this patch reconciles the output differences between
using Xen API and xmlrpc to manipulate domains.
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Thu, 4 Jun 2009 09:43:20 +0000 (10:43 +0100)]
xend: pass-through: Use AUTO_PHP_SLOT as unknown vslot
This fixes a few cases where 0 is still used for an known vslot.
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Thu, 4 Jun 2009 09:41:50 +0000 (10:41 +0100)]
xm: xen-api, pass-through: create: Use vslot for hotplug_slot
Using func for hotplug_slot is not correct, although func is often
zero, previously zero meant please pick a vslot and asking xend to
pick a vslot was the only method available.
This resolves the following error when using Xen API:
$ xm create hvm.conf
...
Internal error: Timed out waiting for device model action.
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Thu, 4 Jun 2009 09:41:13 +0000 (10:41 +0100)]
xend: xen-api, pass-through: Add create_dpci_from_sxp()
Move some duplicated code into create_dpci_from_sxp()
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Thu, 4 Jun 2009 09:40:24 +0000 (10:40 +0100)]
xm, xend: xen-api: DPCI.get_hotplug_slot() returns a decimal
xm uses the following code to read pci information using Xen API:
ppci_ref =3D server.xenapi.DPCI.get_PPCI(dpci_ref)
ppci_record =3D server.xenapi.PPCI.get_record(ppci_ref)
dev =3D {
"domain": int(ppci_record["domain"]),
"bus": int(ppci_record["bus"]),
"slot": int(ppci_record["slot"]),
"func": int(ppci_record["func"]),
"vslot": int(server.xenapi.DPCI.get_hotplug_slot(dpci_ref))
}
As the domain, bus, slot and func values are returned as string
representations of decimal, it makes sense for get_hotplug_slot() to
also return string representations of decimal.
As it is, the int() conversion will break cause xm to fail with
an error if the vslot is in the range 0xa-0xf or 0x1a-0x1f.
$ xm pci-list debian
Error: Invalid argument.
And the int() conversion will return the wrong value if
the vslot is in the range 0x10-0x19.
This patch also alters XendDPCI to store hotplug_vslot as an integer
rather than a string. This is consitent with the way other
values are stored inside XendDPCI.
get_hotplug_slot() returning a string is not consistent
with other calls inside XendDPCI, which return integers.
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Thu, 4 Jun 2009 09:39:32 +0000 (10:39 +0100)]
xend: pass-through: prefix vslot with 0x in device configration
I don't know of the historical reasons for this, but by convention
hex values are stored without a leading '0x' in the backend and
with a leading '0x' in the device configuration.
This patch also removes handling of the case where vslot is missing
from the backend, should never occur.
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Thu, 4 Jun 2009 09:39:03 +0000 (10:39 +0100)]
xm: xen-api, pass-through: Dont pass empty opts
Internally xend doesn't know how to handle empty opts.
This code ensures that opts is only included in the sxpr
if its value will be non-empty.
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Thu, 4 Jun 2009 09:38:13 +0000 (10:38 +0100)]
xm: xen-api: Install create.dtd in SHAREDIR
* Install create.dtd in SHAREDIR
* Use SHAREDIR/create.dtd
* import os.path.join into xenapi_create.py,
it already seems to be used many times
Resolves the following error when using XenAPI:
$ xm create hvm.conf
Couldn't open resource '/usr/share/xen/create.dtd' at
/usr/share/xen/create.dtd:1:0
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Thu, 4 Jun 2009 09:37:39 +0000 (10:37 +0100)]
xend: pass-through: report attach errors from device model
Cc: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
Cc: Edwin Zhai <edwin.zhai@intel.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Thu, 4 Jun 2009 09:36:36 +0000 (10:36 +0100)]
libxc: fix link error on ia64
On ia64, xen-unstable 19698:
f72d26c00002 cannot be built:
../../tools/libxc/libxenguest.so: undefined reference to
`xc_core_arch_map_p2m_writable'
../../tools/libxc/libxenguest.so: undefined reference to `xc_map_m2p'
Because xc_offline_page.c requires xc_map_m2p() in xc_domain_save.c,
xc_offline_page.c must be compiled only if CONFIG_MIGRATE=3Dy.
Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
Keir Fraser [Thu, 4 Jun 2009 09:36:01 +0000 (10:36 +0100)]
rombios: compute checksum for roms bigger than a segment
From: Glauber Costa <glommer@redhat.com>
From: "Sebastian Herbszt" <herbszt@gmx.de>
Ported by: Akio Takebe <takebe_akio@jp.fujitsu.com>
Keir Fraser [Thu, 4 Jun 2009 09:35:03 +0000 (10:35 +0100)]
minios: Introduce BSD license COPYING file
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 3 Jun 2009 17:27:05 +0000 (18:27 +0100)]
minios: Clean up and remove Linux remnants from x86_64.S
Sigend-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 3 Jun 2009 15:20:28 +0000 (16:20 +0100)]
Keir Fraser [Wed, 3 Jun 2009 15:12:34 +0000 (16:12 +0100)]
hvmloader: Scan for gpxe-capable NICs until one is found.
Signed-off-by: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 3 Jun 2009 13:40:34 +0000 (14:40 +0100)]
x86: Clean up get_page_from_l1e() to correctly distinguish between
owner-of-pte and owner-of-data-page in all cases.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 3 Jun 2009 11:59:44 +0000 (12:59 +0100)]
vtd: Fix apic pin to interrupt remapping table index
Originally, it calls xmalloc to set index in
ioapic_rte_to_remap_entry(). When make with debug=y, it may trigger
spinlock BUG_ON because allocate memory with interrupt disabled.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 3 Jun 2009 11:35:25 +0000 (12:35 +0100)]
x86: pin_2_irq[].pin should be initialised to -1.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 3 Jun 2009 10:20:38 +0000 (11:20 +0100)]
typo: occured -> occurred
Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
Keir Fraser [Wed, 3 Jun 2009 10:19:51 +0000 (11:19 +0100)]
xend: requested_vslots is no longer needed
...following removal of the boot-time pci passthru protocol.
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Wed, 3 Jun 2009 10:17:00 +0000 (11:17 +0100)]
x86: Fix XENPF_getidletime to correctly modify cpumask.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 3 Jun 2009 10:11:50 +0000 (11:11 +0100)]
blktap: fix empty QCOW images (bug 1430 part 2)
Empty QCOW images consist of only the L1 table, this results in a
file size which is not sector-aligned. Since blktap uses O_DIRECT, the
block aligned read of the L1 table will go beyond the end of file and
thus returns the actual file size and not the expected length.
This patch checks whether at least the L1 table has been read.
This should fix bug 1430.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Keir Fraser [Wed, 3 Jun 2009 10:11:04 +0000 (11:11 +0100)]
blktap: fix and use ROUNDUP macro (bug 1430 part 1)
As pointed out in Xen Bugzilla 1430 in the blktap QCOW driver the
rounding function is wrong in line 824 of block-qcow.c.
This patch replaces this (and other roundings) with the already
existing ROUNDUP macro (and fixes the usual macro pitfall).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Keir Fraser [Wed, 3 Jun 2009 10:10:07 +0000 (11:10 +0100)]
blktap2: human readable output for tapdisk2 creation problems
This patch fixes the "file object has no attribute find" failure
we've been seeing when starting blktap2 devices and adds more
meaningful error output to conditions where the tapdisk2 process is
unable to create a blktap2 device.
Signed-off-by: Dutch Meyer <dmeyer@cs.ubc.ca>
Keir Fraser [Wed, 3 Jun 2009 10:09:14 +0000 (11:09 +0100)]
minios: refactor xenbus state machine
Implement xenbus_wait_for_state_change and xenbus_switch_state and
change the various frontends to use the two functions and do proper
error checking.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Tue, 2 Jun 2009 10:50:16 +0000 (11:50 +0100)]
xend: pci: improve the assignability checking
1) fix some small typos in util/pci.py;
2) find_all_the_multi_functions(): BDFs of a multi-function PCIe
device could be different in all the 3 fields (bus, device, function),
so we need self.find_parent() and list all t
he BDFs below the parent;
3) to assign a device of the must-be-co-assigned devices, we require
all the related devices should be owned by pciback;
4) detect and disallow duplicate pci string specified in guest config
file due to carelessness.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Tue, 2 Jun 2009 10:49:34 +0000 (11:49 +0100)]
Enable pci mmcfg and ATS for x86_64
This patch enables PCI MMCONFIG in xen and turns on hooks for ATS.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Keir Fraser [Mon, 1 Jun 2009 17:37:27 +0000 (18:37 +0100)]
tmem: shared ephemeral (SE) pool (clustering) fixes
Tmem can share clean page cache pages for Linux domains
in a virtual cluster (currently only the ocfs2 filesystem
has a patch on the Linux side). So when one domain
"puts" (evicts) a page, any domain in the cluster can
"get" it, thus saving disk reads. This functionality
is already present; these are only bug fixes.
- fix bugs when an SE pool is destroyed
- fixes in parsing tool for xm tmem-list output for SE pools
- incorrect locking in one case for destroying an SE pool
- clearer verbosity for transfer when an SE pool is destroyed
- minor cleanup: merge routines that are mostly duplicate
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Mon, 1 Jun 2009 14:52:19 +0000 (15:52 +0100)]
libxc: Implement stub xc_gnttab_map_table() for non-linux.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 1 Jun 2009 13:55:32 +0000 (14:55 +0100)]
Revert 19658:
28a197617286 "Fix up the synchronisation around grant
table map track handles".
There is no race since the hypercall takes the
domain-lock. Furthermore removing locking from get_maptrack_handle()
races gnttab_setup_table().
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 1 Jun 2009 13:39:25 +0000 (14:39 +0100)]
minios: Remove Linux attribution for mktime() as it's not true since c/s 19638.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 1 Jun 2009 13:18:44 +0000 (14:18 +0100)]
Keir Fraser [Mon, 1 Jun 2009 13:15:48 +0000 (14:15 +0100)]
libxc: Exchange a page for PV guest
This patch support exchange a page for a suspended PV guest from user
space.
The basic idea to offline a page is:
1) mark a page offline pending
2) If the page is owned by a HVM domain, user have to live migrate it.
In future, with stub-domain support, we can also exchange the page
without migration.
3) If the page is owned by a PV domain, we will try to exchange the
offline pending page to a new one and free the old page.
This patch achieves item 3.
The method to exchange the offline pending page for PV domain is:
1) Suspend the guest.
2) If the page is being granted out, return with offline pending.
3) Get a copy for the content
4) Scan all page table page to see if any reference to the offending
page, if yes, make the entry to be non-present to reduce the reference
count.
5) After update all page tables, user space tools will try to exchange
the old page. If the new mfn has no reference anymore (i.e.
count_info & count_mask =3D 1), the exchange will allocate a new page,
update the m2p and return success, otherwise it will return fail.
6) If step 5 is success, user space tools will update the content of
the new page change the p2m table, and change all entries scaned in
step 4 to point to new entry.
if step failed, it will try to undo step 4 to revert page table.
7) Resume the guest.
Please refer to thread in
http://www.mailinglistarchive.com/xen-devel@lists.xensource.com/msg63084.html
for more information.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Mon, 1 Jun 2009 13:13:53 +0000 (14:13 +0100)]
libxc: Export xc_core_arch_map_p2m_writable()
This patch firstly change the xc_core_arch_map_p2m() to map the p2m to
be writable, then it export this function.
One notice for this patch is, caller should make sure change the p2m
in flight will not cause trouble.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Mon, 1 Jun 2009 13:13:20 +0000 (14:13 +0100)]
libxc: Add a function to map a domain's grant table.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Mon, 1 Jun 2009 13:12:53 +0000 (14:12 +0100)]
libxc: export xc_map_m2p() so that it can be called outside.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Mon, 1 Jun 2009 13:08:58 +0000 (14:08 +0100)]
Export page offline hypercalls to user space tools.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Mon, 1 Jun 2009 13:07:46 +0000 (14:07 +0100)]
tmem: fix corner case crash on forcible domain destruction
When a tmem-enabled domain is destroyed, if the domain was
using a persistent pool, the domain destruction process
to scrubs page races tmem's attempts to gracefully dismantle
data structures. Move tmem_destroy earlier in the domain
destruction process.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 1 Jun 2009 13:02:26 +0000 (14:02 +0100)]
blktap: Revert parts of c/s 19349.
Caused blktapctrl pipes to be created with uninitialised variable in name.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Sat, 30 May 2009 12:25:32 +0000 (13:25 +0100)]
xend: Fix HVM domain restore (undefined HVM_ImageHandler.superpages).
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Sat, 30 May 2009 12:21:08 +0000 (13:21 +0100)]
Revert 19657:
9ff5c79b0ceb
Breaks automated localhost migration tests:
for xx in x:
TypeError: iteration over non-sequence
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Sat, 30 May 2009 09:24:21 +0000 (10:24 +0100)]
passthrough: Fix test_and_clear_bit() caller to clear bitmap, not bitmap pointer
Latent bug triggered by '19650: eliminate hard-coded NR_IRQS'
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 29 May 2009 08:33:06 +0000 (09:33 +0100)]
xend: Add serialise_pci_opts() and split_pci_opts()
This centralises some code.
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Fri, 29 May 2009 08:32:40 +0000 (09:32 +0100)]
xend: Fix check for request to detach non-existent device
This fixes the check for a request to detatch a non-existent device
in pci_device_configure. The previous check was bogus because the
format of AUTO_PHP_SLOT_STR is not the same as that of x['vslot'].
However, it works in a slightly non-obvious way, checking that vslot
hasn't been altered from its initial value AUTO_PHP_SLOT_STR. To
make this shceme a little clearer, use an empty string as the inital
value.
Formting issues asside, neither AUTO_PHP_SLOT_STR nor the empty
string are valid values for x['vslot']. In the event
of invalid data (indicating a bug), it should be caught by
self.hvm_destroyPCIDevice.
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Fri, 29 May 2009 08:32:02 +0000 (09:32 +0100)]
xend: hot-plug PCI devices at boot-time
Currently there are two interfaces to pass-through PCI devices:
1. A method driven through per-device xenstore entries that is used at
boot-time
2. An event-based method used for hot-plug.
This seems somewhat redundant and makes extending the code cumbersome
and prone to error - often the change needs to be made twice, in
two different ways.
This patch unifies PCI pass-through by using the existing event-based
method at boot-time.
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Fri, 29 May 2009 08:29:58 +0000 (09:29 +0100)]
xend: use popen2 module instead of subprocess for Python 2.3
On Python 2.3, xend cannot started:
File
"usr/lib/python2.3/site-packages/xen/xend/server/BlktapController.=
py", line 3, in ?
import subprocess
ImportError: No module named subprocess
This patch uses `popen2' instead of `subprocess' for Python 2.3.
Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
Keir Fraser [Fri, 29 May 2009 08:28:15 +0000 (09:28 +0100)]
blktap2: fix a compilation error (missing PATH_MAX)
Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
Keir Fraser [Fri, 29 May 2009 08:27:31 +0000 (09:27 +0100)]
xm: Unify the output of pci-list
This is another attempt at having pci-list produce consistent output.
Without this change there differences in the output of both vslots
and domain occur for domains that have never been started and domains
that have been started.
In order to address this I have taken the approach of
using integers where possible and explicitly formating them,
rather than relying on string representations that are present in
data structures.
I have also re-used the common part of the format, to try
and mitigate. the possibility of future inconsistencies there.
This patch also:
* Removes trailing whitespace
* Removes unnecessary brackets and whitespace from print invocations
* Prints the header outside of the loop to avoid having
to maintain a state variable
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Fri, 29 May 2009 08:26:49 +0000 (09:26 +0100)]
Free pirq_array/pirq_to_evtchn in complete_domain_destroy().
Also rejig code slightly in domain_create().
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 29 May 2009 08:22:50 +0000 (09:22 +0100)]
Revert 19661:
326b24bfa9f9 "Free pirq_to_evtchn/pirq_mask..."
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 29 May 2009 08:19:30 +0000 (09:19 +0100)]
[VTD] laying the ground work for ATS
These changes lay the ground work for ATS enabling in Xen. It will be
followed by patch which enables PCI MMCFG which is needed for actual
enabling of ATS functionality.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Keir Fraser [Thu, 28 May 2009 10:07:19 +0000 (11:07 +0100)]
Serialize iptables calls in hotplug scripts
iptables cannot correctly handle situations when more than one command
is trying to set netfilter rules. In such situations, iptables may fail
with EAGAIN, which results in iptables: Unknown error
18446744073709551615.
Such situation can easily happen when multiple network devices are
configured for a domain as vif hotplug scripts are called in parallel
for all of the network devices.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Keir Fraser [Thu, 28 May 2009 10:01:00 +0000 (11:01 +0100)]
blktap2: Fix build with gcc3. Cannot handle defining a function which
is passed a struct-by-value which is not yet fully defined. Thus
defining a request struct which contains a pointer to a function which
is passed-by-value an instance of that request structure is
impossible. We work around it by defining the function poiinter as
void* and then casting in one place.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 28 May 2009 09:19:15 +0000 (10:19 +0100)]
x86: Fix flush_area_mask() and on_selected_cpus() to not race updates
of the supplied cpumask (which is now passed by reference).
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 28 May 2009 09:06:01 +0000 (10:06 +0100)]
xend: Update info['platform']['pci']
This patch updates info['platform']['pci'] for PCI devices
assignment to domains.
When a domain is started, xend confirms by using xc.test_assign_device
whether PCI devices can be assigned to the domain. For the
confirmation, info['platform']['pci'] must be an appropriate value.
However, info['platform']['pci'] may be not appropriate. Because
info['platform']['pci'] isn't almost always updated even if the PCI
device configuration of the domain was changed by using xm
pci-attach/detach. This patch updates info['platform']['pci'] to the
appropriate value when domains are started.
Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
Keir Fraser [Thu, 28 May 2009 09:03:29 +0000 (10:03 +0100)]
blktap2: fix tapdisk-channel.c
This patch fixes the following error.
cc1: warnings being treated as errors
In file included from usr/include/sys/resource.h:25,
from tapdisk-daemon.c:559:
usr/include/bits/resource.h: In function 'main':
usr/include/bits/resource.h:33: warning: ISO C90 forbids mixed
declarations and code
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Keir Fraser [Thu, 28 May 2009 09:02:57 +0000 (10:02 +0100)]
blktap2: fix makefile of blktap2
- clean up to use SUBDIRS-y
- With parallel make, libvhd might not be created before
link. guarantee it.
- use LDFLAGS for link which is set by upper level makefiles.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Keir Fraser [Thu, 28 May 2009 09:00:55 +0000 (10:00 +0100)]
udev-script: udev rule for pci_iomul device.
This patch adds udev rule for pci_iomul device to
create /dev/xen/pci_iomul.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Keir Fraser [Thu, 28 May 2009 08:51:43 +0000 (09:51 +0100)]
x86 vmx: Unrestricted guest (realmode) support
It allows fully virtualized guests to run real mode and unpaged mode
code natively in the VMX mode when EPT is turned on. With the
unrestricted guest there is no need to emulate the guest real mode
code in the vm86 container or in the emulator. Also the guest big real
mode code works like native.
This patch enhances Xen to use the unrestricted guest feature if
available on the processor. It also adds a new xen parameter to
disable the unrestricted guest feature at the boot time.
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Keir Fraser [Thu, 28 May 2009 08:41:59 +0000 (09:41 +0100)]
minios: implement ffs, ffsl and ffsll.
The first function is compiled only in case minios is compiled without
newlib, since newlib already provides an implementation for ffs.
On the other hand ffsl and ffsll are always compiled because newlib
misses those functions.
This patch also provides an implementation for __ffsti2 and __ffsdi2
because they are needed by gcc in order to successfully link ffsll.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 27 May 2009 20:51:04 +0000 (21:51 +0100)]
Update Xen version for 3.5-unstable
Keir Fraser [Wed, 27 May 2009 14:55:29 +0000 (15:55 +0100)]
x86: Fix 32-bit build.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 27 May 2009 13:03:09 +0000 (14:03 +0100)]
evtchn: Free pirq_to_evtchn/pirq_mask arrays on domain destruction.
At the same time, move this into evtchn_init/destroy().
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 27 May 2009 11:00:51 +0000 (12:00 +0100)]
[IA64] adjust ia64 xc_domain_restore() signature
This patch fixes the following error.
ia64/xc_ia64_linux_restore.c:546: error: conflicting types for
xc_domain_restore
./xenguest.h:49: error: previous declaration of xc_domain_restore was
here
make[4]: *** [ia64/xc_ia64_linux_restore.o] Error 1
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Keir Fraser [Wed, 27 May 2009 11:00:32 +0000 (12:00 +0100)]
[IA64] add ia64 _raw_rw_is_write_locked
This patch fixes the following link error.
xen/common/built_in.o: In function `_rw_is_write_locked':
xen/common/spinlock.c:249: undefined reference to
`_raw_rw_is_write_locked'
make[3]: *** [xen/xen-syms] Error 1
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Keir Fraser [Wed, 27 May 2009 10:29:38 +0000 (11:29 +0100)]
Fix up the synchronisation around grant table map track handles.
At present, we're not doing any at all, so if a domain e.g. tries to
do two map operations at the same time from different vcpus then you
could end up with both operations getting back the same maptrack
handle.
Fix this problem by just shoving an enormous lock around grant table
operations. This is unlikely to be heavily contended, because netback
and blkback both restrict themselves to mapping on a single vcpu at a
time (globally for netback, and per-device for blkback), and most of
the interesting bits are already protected by the remote domain's
grant table lock anyway.
The unconteded acquisition cost might be significant for some
workloads. If that were the case, it might be worth only acquiring
the lock only for multi-vcpu domains, since we only manipulate the
maptrack table in the context of one of the domain's vcpus. I've not
done that optimisation here, because I didn't want to think about what
would happen if e.g. a cpu got hot-unplugged from a domain while it
was performing a map operation.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Keir Fraser [Wed, 27 May 2009 10:28:45 +0000 (11:28 +0100)]
xend: Device duplicate check fix
I've checked the duplicate-check code here and I found that's checked
only in the context of one domain but not cross-domain. The thing is
that we should check tap/vbd device cross-domain not to allow another
guest to use the same disk image in some circumstances to prevent VM's
disk corruption.
The patch included denies disk image addition under those
circumstances:
1. We're adding read-only disk that's already used as write-exclusive
2. We're adding write-shared disk that's already used as
write-exclusive
3. We're adding write-exclusive disk that's already used
4. We're adding read-only disk that's already used as write-shared*
(because of I/O caching issues etc.)
The vif device duplicate check remains the same it was and it's
checked in the context of current domain only so that behaviour has
been preserved.
Signed-off-by: Michal Novotny <minovotn@redhat.com>
Keir Fraser [Wed, 27 May 2009 10:27:13 +0000 (11:27 +0100)]
x86 svm: Add support for Pause Filtering to AMD SVM
New AMD processors will support the Pause Filter Feature.
This feature creates a new field in the VMCB called Pause
Filter Count. If Pause Filter Count is greater than 0 and
ntercepting PAUSEs is enabled, the processor will increment
an internal counter when a PAUSE instruction occurs instead
of intercepting. When the internal counter reaches the
Pause Filter Count value, a PAUSE intercept will occur.
This feature can be used to detect contended spinlocks,
especially when the lock holding VCPU is not scheduled.
Rescheduling another VCPU prevents the VCPU seeking the
lock from wasting its quantum by spinning idly.
Experimental results show that most spinlocks are held
for less than 1000 PAUSE cycles or more than a few
thousand. Default the Pause Filter Counter to 3000 to
detect the contended spinlocks.
Processor support for this feature is indicated by a CPUID
bit.
On a 24 core system running 4 guests each with 16 VCPUs,
this patch improved overall performance of each guest's
32 job kernbench by approximately 1%. Further performance
improvement may be possible with a more sophisticated
yield algorithm.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Keir Fraser [Wed, 27 May 2009 10:21:59 +0000 (11:21 +0100)]
xm: Specify sensible default for superpages domain config option.
Signed-off-by: Dave McCracken <dcm@mccr.org>
Keir Fraser [Wed, 27 May 2009 10:19:38 +0000 (11:19 +0100)]
rombios: fix trying to boot from next device
If boot="ndc", rombios cannot try to boot next device.
Because rombios jump to the boot vector without pushing return
address, gPXE code and so on cannot return if it fail to boot.
Signed-off-by: Akio Takebe <takebe_akio@jp.fujitsu.com>
Keir Fraser [Wed, 27 May 2009 10:17:40 +0000 (11:17 +0100)]
x86/hvm: fix off-by-one errors in vcpuid range checks
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Wed, 27 May 2009 10:16:27 +0000 (11:16 +0100)]
Remove unused 'retry' parameter from on_selected_cpus() etc.
Remove the unused "retry" parameter of on_selected_cpus(),
on_each_cpu(), smp_call_function(), and smp_call_function_single().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Wed, 27 May 2009 10:15:08 +0000 (11:15 +0100)]
Pass cpumasks by reference always.
Rather than passing cpumasks by value in all cases (which is
problematic for large NR_CPUS configurations), pass them 'by
reference' (i.e. through a pointer to a const cpumask).
On x86 this changes send_IPI_mask() to always only send IPIs to remote
CPUs (meaning any caller needing to handle the current CPU as well has
to do so on its own).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 27 May 2009 09:38:51 +0000 (10:38 +0100)]
x86: eliminate hard-coded NR_IRQS
... splitting it into global nr_irqs (determined at boot time) and
per- domain nr_pirqs (derived from nr_irqs and a possibly command line
specified value, which probably should later become a per-domain
config setting).
This has the (desirable imo) side effect of reducing the size of
struct hvm_irq_dpci from requiring an order-3 page to order-2 (on
x86-64), which nevertheless still is too large.
However, there is now a variable size bit array on the stack in
pt_irq_time_out() - while for the moment this probably is okay, it
certainly doesn't look nice. However, replacing this with a static
(pre-)allocation also seems less than ideal, because that would
require at least min(d->nr_pirqs, NR_VECTORS) bit arrays of
d->nr_pirqs bits, since this bit array is used outside of the
serialized code region in that function, and keeping the domain's
event lock acquired across pirq_guest_eoi() doesn't look like a good
idea either.
The IRQ- and vector-indexed arrays hanging off struct hvm_irq_dpci
could in fact be changed further to dynamically use the smaller of the
two ranges for indexing, since there are other assumptions about a
one-to-one relationship between IRQs and vectors here and elsewhere.
Additionally, it seems to me that struct hvm_mirq_dpci_mapping's
digl_list and gmsi fields could really be overlayed, which would yield
significant savings since this structure gets always instanciated in
form of d->nr_pirqs (as per the above could also be the smaller of
this and NR_VECTORS) dimensioned arrays.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Wed, 27 May 2009 07:19:30 +0000 (08:19 +0100)]
tmem: build fix
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 26 May 2009 14:01:36 +0000 (15:01 +0100)]
x86 hvm: Allow cross-vendor migration
Intercept #UD and emulate SYSCALL/SYSENTER/SYSEXIT as necessary.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 26 May 2009 10:52:31 +0000 (11:52 +0100)]
blktap2: a completely rewritten blktap implementation
Benefits to blktap2 over the old version of blktap:
* Isolation from xenstore - Blktap devices are now created directly on
the linux dom0 command line, rather than being spawned in response
to XenStore events. This is handy for debugging, makes blktap
generally easier to work with, and is a step toward a generic
user-level block device implementation that is not Xen-specific.
* Improved tapdisk infrastructure: simpler request forwarding, new
request scheduler, request merging, more efficient use of AIO.
* Improved tapdisk error handling and memory management. No
allocations on the block data path, IO retry logic to protect
guests
transient block device failures. This has been tested and is known
to work on weird environments such as NFS soft mounts.
* Pause and snapshot of live virtual disks (see xmsnap script).
* VHD support. The VHD code in this release has been rigorously
tested, and represents a very mature implementation of the VHD
image
format.
* No more duplication of mechanism with blkback. The blktap kernel
module has changed dramatically from the original blktap. Blkback
is now always used to talk to Xen guests, blktap just presents a
Linux gendisk that blkback can export. This is done while
preserving the zero-copy data path from domU to physical device.
These patches deprecate the old blktap code, which can hopefully be
removed from the tree completely at some point in the future.
Signed-off-by: Jake Wires <jake.wires@citrix.com>
Signed-off-by: Dutch Meyer <dmeyer@cs.ubc.ca>
Keir Fraser [Tue, 26 May 2009 10:05:04 +0000 (11:05 +0100)]
Transcendent memory ("tmem") for Xen.
Tmem, when called from a tmem-capable (paravirtualized) guest, makes
use of otherwise unutilized ("fallow") memory to create and manage
pools of pages that can be accessed from the guest either as
"ephemeral" pages or as "persistent" pages. In either case, the pages
are not directly addressible by the guest, only copied to and fro via
the tmem interface. Ephemeral pages are a nice place for a guest to
put recently evicted clean pages that it might need again; these pages
can be reclaimed synchronously by Xen for other guests or other uses.
Persistent pages are a nice place for a guest to put "swap" pages to
avoid sending them to disk. These pages retain data as long as the
guest lives, but count against the guest memory allocation.
Tmem pages may optionally be compressed and, in certain cases, can be
shared between guests. Tmem also handles concurrency nicely and
provides limited QoS settings to combat malicious DoS attempts.
Save/restore and live migration support is not yet provided.
Tmem is primarily targeted for an x86 64-bit hypervisor. On a 32-bit
x86 hypervisor, it has limited functionality and testing due to
limitations of the xen heap. Nearly all of tmem is
architecture-independent; three routines remain to be ported to ia64
and it should work on that architecture too. It is also structured to
be portable to non-Xen environments.
Tmem defaults off (for now) and must be enabled with a "tmem" xen boot
option (and does nothing unless a tmem-capable guest is running). The
"tmem_compress" boot option enables compression which takes about 10x
more CPU but approximately doubles the number of pages that can be
stored.
Tmem can be controlled via several "xm" commands and many interesting
tmem statistics can be obtained. A README and internal specification
will follow, but lots of useful prose about tmem, as well as Linux
patches, can be found at http://oss.oracle.com/projects/tmem .
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Tue, 26 May 2009 09:14:34 +0000 (10:14 +0100)]
uninstall: get rid of hardcoded pathes
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Keir Fraser [Tue, 26 May 2009 09:13:43 +0000 (10:13 +0100)]
x86 hvm viridian: Provide dummy support for APIC assist page to satisfy Win7.
From: Tim Deegan <tim.deegan@citrix.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 26 May 2009 09:05:27 +0000 (10:05 +0100)]
pvgrub: few lines in shutdown_blkfront were removed by mistake.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Tue, 26 May 2009 09:04:10 +0000 (10:04 +0100)]
xm, xend: passthrough: Add assigned_or_requested_vslot()
Add an accessor to simplify accessing vslot if available,
otherwise requested_vslot.
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Tue, 26 May 2009 09:03:09 +0000 (10:03 +0100)]
xend: Fix xm pci-detach for inactive devices
In the case where a device is attached to an inactive domain
and then removed before the domain is activated it won't have
a vslot assigned, but it should still be valid to remove it.
I don't think that there are any other cases where vslot can
be invalid.
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Tue, 26 May 2009 09:01:54 +0000 (10:01 +0100)]
blkif: Clarify units for 'sector'-sized blkif request params.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Tue, 26 May 2009 08:58:38 +0000 (09:58 +0100)]
Add support for superpages (hugepages) in PV domain
This patch adds the option "superpages" to the domain configuration
file. If it is set, the domain is populated using 2M pages.
This code does not support fallback to small pages. If the domain can
not be created with 2M pages, the create will fail.
The patch also includes support for saving and restoring domains with
the superpage flag set. However, if a domain has freed small pages
within its physical page array and then extended the array, the
restore will fill in those freed pages. It will then attempt to
allocate more than its memory limit and will fail. This is
significant because apparently Linux does this during boot, thus a
freshly booted Linux image can not be saved and restored successfully.
Signed-off-by: Dave McCracken <dcm@mccr.org>
Keir Fraser [Tue, 26 May 2009 08:54:53 +0000 (09:54 +0100)]
minios: replace mktime implementation
In the efforts to clarify MiniOS license it came to my attention that
few portions of MiniOS were taken from other GPL projects, one of them
is the mktime implementation. This patch replaces the current GPL
licensed mktime implementation with a different and BSD licensed
version.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Tue, 26 May 2009 08:52:59 +0000 (09:52 +0100)]
PV-on-HVM: Define atomic_cmpxchg() for old Linux kernels.
Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
Keir Fraser [Tue, 26 May 2009 08:50:35 +0000 (09:50 +0100)]
stubdom: 'file' based disk sharing
Allow 'file' based disks, that are blkback based disks, to be shared
between the guest domain and the stubdom. It does so exploiting the
same exception introduced in the previous patch "stubdoms phy disks
sharing". Now we can remove the hack in stubdom-dm that forces "file"
disks to be opened using blktap instead of blkback.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Tue, 26 May 2009 08:49:19 +0000 (09:49 +0100)]
minios: Fix blkfront driver when sector_size != 512
The first and last sector as well as the sector number of the request
is expressed in 512 bytes units, independently from the real sector
size.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Thu, 21 May 2009 03:31:47 +0000 (04:31 +0100)]
xend: Fix typo in usage of new auxbin.xen_configdir() function
Signed-off-by: Keir Fraser <keir.fraser@eu.citrix.com>
Keir Fraser [Wed, 20 May 2009 15:02:50 +0000 (16:02 +0100)]
ACPI/NUMA: Improve SRAT parsing
This is to properly handle SRAT rev 2 extended proximity domain
values.
Also a first step to eliminate the redundant definitions of
ACPI provided table structures (Linux eliminated all of the duplicates
from include/linux/acpi.h in 2.6.21).
Portions based on a Linux patch from Kurt Garloff <garloff@suse.de>
and Alexey Starikovskiy <astarikovskiy@suse.de>.
IA64 build tested only.
Signed-off-by: Jan Beulich <jbeulich@novell.com>